[reland][ROCm] preshuffled weight mm #2044

jeffdaily · 2025-04-11T17:58:57Z

Adds SwizzleTensor subclass that wraps a Tensor and reorders the contents to be suitable for HIPBLASLT_ORDER_COL16_4R8. SwizzleTensor intercepts torch.mm and replaces with custom calls to hipblaslt.

pytorch-bot · 2025-04-11T17:59:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2044

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[Infra] Jobs got frequently cancelled, sometimes mid-checkout

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jeffdaily · 2025-04-11T18:39:24Z

@mxz297 @jerryzh168 please re-review, kick of CI, thanks.

mxz297 · 2025-04-13T22:12:47Z

@jeffdaily "test-mps-ops" still seems to be failing to compile with

  /Users/ec2-user/runner/_work/ao/ao/torchao/csrc/rocm/swizzle/swizzle.cpp:1:10: fatal error: 'hip/hip_runtime.h' file not found
  #include <hip/hip_runtime.h>
           ^~~~~~~~~~~~~~~~~~~

I wonder if we should just guard the whole source file under #if USE_ROCM

jeffdaily · 2025-04-14T16:28:51Z

@jeffdaily "test-mps-ops" still seems to be failing to compile with
  /Users/ec2-user/runner/_work/ao/ao/torchao/csrc/rocm/swizzle/swizzle.cpp:1:10: fatal error: 'hip/hip_runtime.h' file not found
  #include <hip/hip_runtime.h>
           ^~~~~~~~~~~~~~~~~~~
I wonder if we should just guard the whole source file under #if USE_ROCM

Done.

facebook-github-bot · 2025-04-15T18:42:42Z

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-04-21T16:34:31Z

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-04-21T17:30:17Z

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297

* [ROCm][experimental] pre-shuffle weights * add custom gemm op * pass through swizzled * copy paste bug causing extra matmul to execute * correct transpose and permute logic * swizzle.cpp is rocm-only, remove #ifndef USE_ROCM * transpose is shallow, don't unswizzle/swizzle * add fp8 swizzle * remove print statement * setup.py missing check for vec ext * remove merge mistake * conditionalize building sparse marlin for hip * ruff format * ruff check --fix * protect swizzle.cpp inside USE_ROCM * patch from @mxz297

jerryzh168 · 2025-04-22T21:21:19Z

is this not fixed?

__w/ao/ao/pytorch/ao/torchao/csrc/rocm/swizzle/swizzle.cpp:4:10: fatal error: hip/hip_runtime.h: No such file or directory
    4 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

jerryzh168 · 2025-04-22T21:21:24Z

https://github.com/pytorch/ao/actions/runs/14604068152/job/40968945843?pr=2103

mxz297 · 2025-04-23T14:25:37Z

@jeffdaily @jerryzh168 This is very strange.... So, previously, this error

__w/ao/ao/pytorch/ao/torchao/csrc/rocm/swizzle/swizzle.cpp:4:10: fatal error: hip/hip_runtime.h: No such file or directory
    4 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

only shows up on non-AMD platform, so we added a commit that will guard the whole source file under #if USE_ROCM. And we indeed no longer saw this failure anymore. And i saw clean merge signals before doing the merge.

So, i am a little bit surprised by:
(1) why the test failure did not show up before the merge?
(2) The failure now is on AMD platforms. Jeff have tested this on OSS platform and I have tested this inside meta. I feels like more likely the test platform does not have proper rocm setup?

@jeffdaily Are you able to repro these rocm failures somehow?

jerryzh168 · 2025-04-23T18:31:36Z

the error appears in internal diff as well: https://www.internalfb.com/diff/D73052566 I think we should revert for now? does this error not appear in the original PR/diff?

mxz297 · 2025-04-23T20:30:21Z

@jerryzh168 replied in the internal diff, but it seems like some other failure, which feels like caused by some other diff, though

petrex · 2025-04-24T04:08:15Z

I am seeing the same error in wheel build.
Maybe we are missing proper tool-chain/ rocm headers in the pytorch/manylinux2_28-builder:rocm6.2.4 ? or just some env var? @amdfaa

HDCharles · 2025-04-24T20:06:06Z

looking at our code, we have:

#if defined(USE_ROCM)
#include <hip/hip_bf16.h>
#include <hip/hip_fp16.h>
#include <hip/hip_runtime.h>
#endif

in tensor_core_tiled_layout.cu

Is the hip include here not gated correctly?

jeffdaily · 2025-04-25T19:36:22Z

That looks gated correctly. The CI build is missing -I/opt/rocm for some reason. The header files are there, but flag is missing.

jerryzh168 · 2025-04-25T23:12:44Z

torchao/__init__.py


 __all__ = [
    "dtypes",
    "autoquant",
    "optim",
    "quantize_",
+    "swizzle",


why is this added to top level? should this be in prototype for now?

jerryzh168 · 2025-04-25T23:13:26Z

torchao/swizzle/__init__.py

@@ -0,0 +1,9 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


we don't want to create a new folder under torchao for this tensor/op I think..

Where do you recommend for it to go?

is this prototype? we can add to torchao/prototype for now

This reverts commit 2266451.

…2170) * Revert "[reland][ROCm] preshuffled weight mm (#2044)" This reverts commit 2266451. * Revert "Re-land "Add INT8 SDPA path for CPU" (#2093)" This reverts commit 137b079.

jeffdaily added 21 commits February 10, 2025 21:45

[ROCm][experimental] pre-shuffle weights

780aa60

add custom gemm op

308e0c9

pass through swizzled

75b6903

Merge branch 'main' into rocm_swizzle

de08451

Merge branch 'main' into rocm_swizzle

6430bbf

copy paste bug causing extra matmul to execute

5a10803

Merge branch 'main' into rocm_swizzle

eaebca0

correct transpose and permute logic

ae9ca6a

Merge branch 'main' into rocm_swizzle

7a74355

swizzle.cpp is rocm-only, remove #ifndef USE_ROCM

2dff63b

transpose is shallow, don't unswizzle/swizzle

fe461af

add fp8 swizzle

b087d92

Merge branch 'main' into rocm_swizzle

cea5355

remove print statement

640f00e

setup.py missing check for vec ext

4343c78

Merge branch 'main' into rocm_swizzle

28e5d89

remove merge mistake

8b57424

Merge branch 'main' into rocm_swizzle

95f967e

conditionalize building sparse marlin for hip

749bc7d

ruff format

ee8ce80

ruff check --fix

53000ac

pytorch-bot bot added ci-no-td module: rocm labels Apr 11, 2025

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 11, 2025

jeffdaily mentioned this pull request Apr 11, 2025

[ROCm] preshuffled weight mm #1702

Merged

protect swizzle.cpp inside USE_ROCM

aa2b42c

jeffdaily added 2 commits April 21, 2025 16:25

Merge branch 'main' into rocm_swizzle

6df441f

patch from @mxz297

8a10945

mxz297 merged commit 2266451 into pytorch:main Apr 22, 2025
4 checks passed

petrex mentioned this pull request Apr 24, 2025

Disable ROCm support in the Linux wheels build workflow #2124

Merged

jerryzh168 reviewed Apr 25, 2025

View reviewed changes

atalman mentioned this pull request May 2, 2025

nightly build for mac stops on 0422 #2157

Closed

atalman added a commit to atalman/ao that referenced this pull request May 5, 2025

Revert "[reland][ROCm] preshuffled weight mm (pytorch#2044)"

0b0385d

This reverts commit 2266451.

atalman mentioned this pull request May 6, 2025

Fix linux cpu builds. Resolves nightly build for mac stops on 0422 #2170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[reland][ROCm] preshuffled weight mm #2044

[reland][ROCm] preshuffled weight mm #2044

jeffdaily commented Apr 11, 2025

pytorch-bot bot commented Apr 11, 2025 •

edited

Loading

jeffdaily commented Apr 11, 2025

mxz297 commented Apr 13, 2025

jeffdaily commented Apr 14, 2025

facebook-github-bot commented Apr 15, 2025

facebook-github-bot commented Apr 21, 2025

facebook-github-bot commented Apr 21, 2025

jerryzh168 commented Apr 22, 2025

jerryzh168 commented Apr 22, 2025

mxz297 commented Apr 23, 2025

jerryzh168 commented Apr 23, 2025 •

edited

Loading

mxz297 commented Apr 23, 2025 •

edited

Loading

petrex commented Apr 24, 2025

HDCharles commented Apr 24, 2025

jeffdaily commented Apr 25, 2025

jerryzh168 Apr 25, 2025

jerryzh168 Apr 25, 2025

jeffdaily Apr 30, 2025

jerryzh168 May 2, 2025

		@@ -0,0 +1,9 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.

[reland][ROCm] preshuffled weight mm #2044

[reland][ROCm] preshuffled weight mm #2044

Conversation

jeffdaily commented Apr 11, 2025

pytorch-bot bot commented Apr 11, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2044

❗ 1 Active SEVs

jeffdaily commented Apr 11, 2025

mxz297 commented Apr 13, 2025

jeffdaily commented Apr 14, 2025

facebook-github-bot commented Apr 15, 2025

facebook-github-bot commented Apr 21, 2025

facebook-github-bot commented Apr 21, 2025

jerryzh168 commented Apr 22, 2025

jerryzh168 commented Apr 22, 2025

mxz297 commented Apr 23, 2025

jerryzh168 commented Apr 23, 2025 • edited Loading

mxz297 commented Apr 23, 2025 • edited Loading

petrex commented Apr 24, 2025

HDCharles commented Apr 24, 2025

jeffdaily commented Apr 25, 2025

jerryzh168 Apr 25, 2025

Choose a reason for hiding this comment

jerryzh168 Apr 25, 2025

Choose a reason for hiding this comment

jeffdaily Apr 30, 2025

Choose a reason for hiding this comment

jerryzh168 May 2, 2025

Choose a reason for hiding this comment

pytorch-bot bot commented Apr 11, 2025 •

edited

Loading

jerryzh168 commented Apr 23, 2025 •

edited

Loading

mxz297 commented Apr 23, 2025 •

edited

Loading